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Abstract. This paper is composed of two main results concerning chains of infi- 
nite order which are not necessarily continuous. The first one is a decomposition 
of the transition probability kernel as a countable mixture of unbounded proba- 
bilistic context trees. This decomposition is used to design a simulation algorithm 
which works as a combination of the algorithms given by [Comets et al.\ l |2002[ ) 
and |Gallo| | |2009| l. The second main result gives sufficient conditions on the kernel 
for this algorithm to stop after an almost surely finite number of steps. Direct 
consequences of this last result are existence and uniqueness of the stationary 
chain compatible with the kernel. 



1. Introduction 

The goal of this paper is to construct a perfect simulation scheme for chains of 
infinite order on a countable alphabet, compatible with a transition probability kernel 
which is not necessarily continuous. By a perfect simulation algorithm we mean an 
algorithm which samples precisely from the stationary law of the process. 



Perfect simulation for chains of infinite order was first done by Comets et al. ( 2002 1 



under the continuity assumption. They used the fact (observed earlier by Kalikow 
(1990)) that under this assumption, the transition probability kernel can be decom- 
posed as a countable mixture of Markov kernels. Then, Gallo| (2009 ) obtained a perfect 
simulation algorithm for chains compatible with a class of unbounded probabilistic 
context trees where each infinite size branch can be a discontinuity point. 

In this paper, we consider a class of transition probability kernels which are neither 
necessarily continuous nor necessarily probabilistic context trees. In fact, the same in- 



finite size branches as in the context trees considered by Gallo ( 2009 ) are allowed to be 



discontinuity points, and the other branches must have a certain localized-continuity 
assumption. Under these new assumptions, we obtain a Kalikow-type decomposition 
of our kernels as a mixture of unbounded probabilistic context trees. The fact that 
our decomposition involves unbounded probabilistic context trees instead of Markov 



kernels (as it was the case for Kalikow ( 1990 )) seems to be "the price to pay" to allow 
discontinuities at some points. 

As a consequence of this decomposition and some minimum extra condition, we 
can show that there exists at least one stationary chain compatible with our kernels, 
extending the existing result stating that continuity was sufficient. A perfect simu- 
lation is then constructed using this decomposition together with the coupling from 



the past (CFTP) method introduced in the seminal paper of Propp & Wilson ( 1996 ). 



One of the main consequence of the existence of a perfect simulation algorithm is the 
fact that there exists a unique stationary chain compatible with our kernels. 
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More precise explanations of what is done here are postponed to Section l3] since 
we need the notation and definitions given in Section 2. Our first main result, Theo- 
rem |4?T] which is stated and proved in Section 4, gives the decomposition which holds 
without the continuity condition. In Section 5, we explain how our perfect simulation 
works using Thcorem |4.f [ and we present it under the form of the pseudo-code, Algo- 
rithm 1. After that, we state our second main theorem, Theorem |5 . 1 1 which says that 
Algorithm 1 stops almost surely after a finite number of steps. Section [6] is dedicated 



to the proof of Theorem 5.1 We finish this paper with some comments and further 
questions. 

2. Notation and definitions 

Let ^ be a countable alphabet. Given two integers m < n, we denote by a"^ the 
string am . . . a„ of symbols in A. For any m < n, the length of the string aJ^ is denoted 
by |a^| and is defined by |aj^| = n — m + 1. For any n g Z, we will use the convention 
that a"^]^ = 0, and naturally |aJJ+i| — 0. Given two strings v and w', we denote 
by vv' the string of length \v\ + \v'\ obtained by concatenating the two strings. The 
concatenation of strings is also extended to the case where v denotes a semi-infinite 
sequence, that is ii = . . . a_2a-i, O-i G A for j > 1. If n is a positive integer and v a 
finite string of symbols in A, we denote by v" — v . . .v the concatenation of n times 
the string v. We denote 

-t-oo 
^-N ^ ^{...,-2.-1} ^^^ A* = [j A{-^'--l> , 

which are, respectively, the set of all infinite strings of past symbols and the set of 
all finite strings of past symbols. The case j ~ corresponds to the empty string 0. 
Finally, we denote by a = . . . a_2a-i the elements of A^'^. 

2.1. Standard definitions. A transition probability kernel (or simply kernel in the 
sequel) on an alphabet A is a function 

P: Ax A-'* ^ [0,1] 

{a, a) ^ P{a\a) ^ > 

such that 

Y^ P{a\a) = 1 , Va e A'^*. 

aeA 

In this paper, we consider kernels P which depends on an unbounded part of the past, 
unlike the markovian case. A stationary stochastic chain X = (X„)„gz on A having 
law /i is said to be compatible with a kernel P if the later is a regular version of the 
conditional probabilities of the former, that is 

fi{Xo = a\Xzl, - a) - P{a\a) (2) 

for every a £ A and /i- almost every a in A^^^. We call these chains chains of infinite 
memory. 

2.2. Probabilistic context tree. We say that a kernel P has a probabilistic context 
tree representation if there exists a function d : A^^ — > N U {+00} such that for any 
two infinite sequences of past symbols a and b 

It follows that the length d{a) only depends on the suffix alL^-, of a. This allows us 
to identify the set r := {0'ZMa)}aeA-^'> with the set of leaves of a rooted tree where 
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each node has either \A\ sons (internal node) or sons (leaf). The set r is called the 



context tree, "context" being the original name Rissanen ( 1983 ) gave to the strings 

when he introduced this model. A probabilistic context tree is an ordered pair (r, p) 
where t is a context tree and p := {p{a\v)}aeA,veT is a set of transition probabilities 
associated to each element of r. Thus, the probabilistic context tree {t,p) represents 
the kernel P if for any a £ A^^^ and any a G A 

P{a\a) ^p{a\cr{a)). 



Examples of probabilistic context trees are shown in Figures 1(a) (for the bounded 
case) and 1(b) (for the unbounded case). In the first one, at each leaf (context) 
of the tree we associate three boxes in which are given the transition probabilities 
to each symbols of A given this context. In the second one, we only specify the 
probability pi :— p{2\0^2) (observe that we swap the order when we write the context 
in a conditioning), the transition probabilities to 1 are simply 1 — pi- 




131 231 331 

\MM^ UM3 [snni 

(a) 




l°°Poo 21111 P4 

(b) 



Figure 1 . Examples of probabilistic context trees. 

A stochastic chain X compatible, in the sense of (pi), with a probabilistic context 
tree is called a chain with variable length memory. 

3. Motivation 

The aim of this section is twofold: it motivates and explains at the same time, the 
present work. 

3.1. Countable mixture of Markov kernels under the continuity assump- 
tion. We say that a point (an infinite sequence of past symbols) a is a continuity 
point for a given transition probabilities kernel P if 

/3fc(a) := supsup|P(a|aI^y) - P(a|al^z)| ^°° 0. 

aeA y,z 

Otherwise, we say that a is a discontinuity point for P. P is said to be continuous if 

/3fe:-sup/3fe(a)'^°°0. (3) 



Kalikow ( 1990 ) showed that continuous transition probability kernels P can be 



represented through the form of a countable mixture of Markov kernels, that is, there 
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exist two probability distributions {pQ^^{a)}a^A and {^k^^}k>o and a sequence of 
Markov kernels {Pk^^}k>i such that for any a G A and z £ A^^ 

P{a\z) = X^'v^'ia) + E Ar^Pr^(«kI^). (4) 

k>l 



The superscript "CFF" refers to the fact that we will use the definitions from Comets 



et al. (20021 



Define an N- valued random variable K'-^^^ taking value fc > w.p. A^r^^. De- 
composition (H) means the following. To choose the next symbol looking at the whole 
past z using the distribution {P{a\z)}aeA is equivalent to the following two steps 
procedure: 

(1) choose K^^^, 

(2) (a) if K'-'^^ = 0, then choose the next symbol w.p. {po(o)}oeA, 

(b) if K'^^^ = k > then choose the next symbol looking at zZl and using 

{p^^^{a\zZl)}aeA- 
Observe that K'~^^^ is independent of everything (in particular, it does not depend 
on z). 

To clarify the parallel between this decomposition and the decomposition pre- 



sented in Theorem 4.1 we explain how Comets et al. (2002) define the distribution 
{Afe'^^}fc>o- For any a £ A and aZ]. G A'^ they consider the functions 

a^^^{a)^miPia\z) and a^^^(a|a:^) = inf P(a|a:^z) 
and the sequence {cik^^}k>o defined by a^'^^ — J2aeA '^o'^''^(^) ^^'^'^ ^'^^ ^^Y k> 1 



CFF ■ f Y^ CFF/ I - 

''-k'^^"' aeA 



These are, as they say, "probabilistic threshold for memories limited to k preceding 
instants." Taking the infimum over every aZ/. is related to the continuity assumption 
([3|. In fact, to assume continuity is equivalent to assume that a^^^ goes to 1 as fc 
diverges, and to assume punctual continuity in a is equivalent to assume that 

a^'-ia):^J2a^^-ia\aZl) 

goes to 1 as fc diverges. Under the continuity assumption ([3]), the probability distri- 
bution {A^^-^}fc>o used in ^ is defined as follows: A^^-'^ = "fc'^^ - ^^k^f for fc > 1 
and \^^^ = a§^^. 

3.2. Without the continuity assumption. To fix ideas, in the remaining of this 
section, assume P is a transition probability kernel on ^4 = {1, 2} which has a single 
discontinuity point which is 1~^. Then a^^^(a) goes to 1 as fc diverges if and only 
if a ^ 1^^. In this case, a^^^ does not converge to 1 and the above result does not 
apply. 



3.2.1. The context tree assumption. Gallo (2009) assumed that P is represented by 



the probabilistic context tree (t, p), where 



r = l-^u[J U a-,2r 



^>OQ:Jj.jeA«(') 



f : N — > N being a deterministic function. This context tree is represented in Figure 
l2J Observe that, under this assumption, for any i > and fc > i{i) we have 

P{a\r2aZlz) = P{a\l'2aZly), for any z and y G A"^. (5) 
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It follows that 



Figure 2. 



inf Va^+^^i(a|r2a:i) 



whenever k > i.{i). Therefore X^aeA '-'^fe'^^('^l'^-fe) S"-"^® to 1 as fc diverges for any 
a^ 2^-N_ ^g |-|q j^q-(- gpg(.jfy virhat happens for the point a~ 1^^* . Making a parallel 
with the above case, we can decompose such a kernel as follows: for any a G A and z 

P{a\z) = X'^^^p^.^^ia) + (1 - \^,^^)p'{a\c,{z)) 

where 



p'(a|cr(z)) 



1 - x^pp 



Define an N- valued random variable K'^ which takes value w.p. A^^^ or |ct-(z)| 
w.p. 1 — Ag'^^. The context tree assumption for P means the following. To choose 
the next symbol looking at the whole past z using the distribution {P{a\z)}a£A is 
equivalent to the following two steps procedure: 

(1) choose K^, 

(2) (a) if K"^ ~ 0, choose the next symbol w.p. {p^^^ {a)}a£A^ 

(b) if K'-^ — |ct(z)|, choose the next symbol looking at Cr{z) and using 
{p'kio-\cr{z)}aeA- 
Observe that the random variable K'^ is a deterministic function of the past z when- 
ever its value is not 0: K'^ = |ct(^)|. 

3.2.2. The countable mixture of probabilistic context trees. So far, two extreme cases 
have been considered: K'-^ is a deterministic function of the past, and K'~''^^ is a 
random variable totally independent of the past. In the present work, we introduce 
a way to combine these two approaches. It allows us to consider kernels P which 
are neither necessarily represented by a probabilistic context tree, nor necessarily 
continuous. This new approach is based on the assumption that 

afc:=inf inf ^ a^^^i(a|1^2a:i) '^ 1. (6) 

- "-fcS-^ a<^A 

The ttfe's are "probabilistic threshold for memories going until the k^^ instant pre- 
ceding the last occurrence of symbol 2 in the past." In this case also, we have that 
SaeA '^fe'^''^('^k-fc) go'^s to 1 as A: diverges for any a ^ 1^^ and not necessarily for 
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1~^. Notice that the probabiUstic context tree assumption we introduced in Section 



3.2.1| only satisfies 

inf 5:a^^^,^,(a|r2a:i) '^ 1. 

Under assumption (|6| , it will be shown in the next section that there exists a proba- 
bility distribution {Xk}k>Oi and a sequence of probabilistic context trees {{Tk,Pk)}k>o 
such that 

P{a\z) = A?^^Po^^^(a) + ^ A,pfc(a|c.,(z)). (7) 

k>0 

The kth context tree of decomposition ([TT|) is given by 



rfe:=l-«uU U aZl21\ (8) 

i>OaZleAi' 

The sequence of context trees {Tk}k>o for the present particular case is illustrated in 
Figure^ Define a random variable K'^^ taking values w.p. Xq^^ and |ct-j^(z)| w.p. 
Xk for fc > 0. One more time, let us translate this decomposition into a two steps 
procedure: 

(1) choose K'^'^, 

(2) (a) if K'-^'-^ ~ 0, choose the next symbol w.p. {pQ^^{a)}a(zA, 

(b) if K^^ = |ct-j,(z)| for some k > 0, choose the next symbol looking at 
Crfe(z) and using {pkia\cr^{z)}a(zA- 
Observe that this time, the random variable K'~''~^ depends on the past z, but through 
a random mechanism using the distribution {Xk}k>Q- 

To n T2 Tk 




Figure 3. 

In the next section, we state our first main result in a general framework. The 
alphabet can be countable and the role which is played above by symbol 2 can be 
played by any finite string w G A*. In this case, we allow P to have discontinuities 
at every point z £ A^^ which does not have w as subsequence. 

4. First main result: a countable mixture of unbounded probabilistic 

context trees 

4.1. Some more definitions and statement of the first results. Fix a finite size 
string w (z A* and define the function m"^ which associates to any string aZm € ^'", 
\w\ < m < +00 the distance to the first occurrence of w when we look backward in 
a-m, • ■ ■ , fl-i, that is 

m^iaZl) = inf {A: > : aZ^Z]^^ = w}, (9) 
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we use the convention m'^{aZm) = +00 if the set of indexes is empty. Using this 
definition, we introduce 

l''{w) :={(a_fc,...,a-i) : a G A""^ and m"'(a) = fc} 

which is the set of strings v of length k such that there is a unique occurrence of w in 
the concatenation wv. For w,v e A*, \w\ < \v\, we use the abuse of notation w £ v 
[resp. w ^ v) which means "w is (resp. is not) a substring of u". Then, for any string 
w G A* and k > \w\, we denote the set of the strings of length k in which w does not 
appear as a subsequence by 

A'^iiv) := {aZl G A'' : a^+l""'"' ^w,i = -k,..., -\w\}. 

Its complement is denoted by A'^{w) — A'' \ A''{w). Finally, A^'*'(iD), denotes the set 
of infinite sequences of past symbols a such that w ^ a, and ^^^(w) :— A^^\A'~^^{w). 
Observe that I*'{w) can be different from A'^{'w). 

Theorem 4.1. Consider a transition probability kernel P such that 

a^:=inf inf inf V inf P(a|6lJ u; d^ z) ''^°° 1. (10) 

Then, there exist two probability distributions {AJf}fe>_i and {p^i{a)}a£A, o,nd a 
sequence of probabilistic context trees {{T]f ,p^)}k>a such that 

P{a\z) = X^,p-M) + E >^kPkHcr-iz)). (11) 

fc>o 

Corollary 4.1. Under the same condition of Theorem \4.1\ if 

inf inf P(a|a)>0, (12) 

then there exists at least one stationary chain compatible with P in the sense of M). 

Corollary |4.1| follows from Theorem |4.1| by the same arguments used in proof of 
Theorem 11 in Kalikow (19901. The decomposition of P as mixture of probabilistic 
context trees together with assumption (12 1 provide the necessary features to obtain 
that the limit of the empirical measures is in fact compatible with P. 

Since the assumption of Theorem |4.1| is not intuitive, let us give sufficient conditions 
for Theorem 14. II to hold. 



Proposition 4.1. 



implies that 



inf VinfP(a|aI^z)''^°°l (13) 

inf Y'miP{a\aZ\z)^^^l (14) 

which implies that ajf converges to 1 as k diverges. 

A consequence of this proposition is that the assumption of Theorem |4.1| is weaker 
than the continuity assumption. It follows, in particular, that the existence result of 
Corollary |4 . 1 1 extends the well-known fact that existence hold whenever the transition 
probability kernel is continuous. 

The proofs of Theorem 4J_ and Proposition 4J_ are given in Section |4.3[ 
Theorem 4.1 is based on the existence of a triplet of parameters (which is not 
unique): two probability distributions {A™}fe>_i and {p™i(a)}aGA) and a sequence of 
probabilistic context trees {(T^,Pfc )}/c>o- What follows is dedicated to the definition 
of such a triplet of parameters. 
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4.2. A triplet of parameters {{X^}k>-i, {P-i{a)}aeA, {(^^^-Pfc )}fc>o). We fix a 
str ing w oi A* . The def inition of our triplet is based on two partitions of [0, 1[ inspired 



by 



Comets et al. 



ce A' 



(2002 1. Let us first show that for any a £ A, i> 0, h_\ G I^{w) and 
inf P{a\bz}, w cZlz) ^^^ P{a\bz\wc). (15) 



Observe that 

Q<Y.{p{a\h-_]wc)-miP{a\b-_]wcZlz)] = 1 - J] inf P(a|6:>c:iz). 



aGA 



aeA 



Moreover, 



V inf P{a\bZ]wcZlz) > inf inf inf V inf P{a\bzl w cZl z) 

- ^-°bzleX'{w)czleAi' f^ z 



aeA 



therefore, under assumption (10), X^aeA i^^^ -^('^l^-i^'^-fe^) S^^s to 1 and 

A;— v+oo 



aeA 



J2 PHbZ] wc)- inf P{a\bz] w cZl z] 



0. 



Since all the terms in the sum over A are positive the convergence of (151 holds. 

4.2.1. Definition of the first partition of [0,1[. We will denote for any v £ A^^ U A* 
such that m'^{v) < +oo 



a"' {a, V, k) := inf P I a 



-1 -(m'"(v) + \w\+l) 



for A: = 0, . . 



m^{v) — 1^1. This notation is not ambiguous since once we fix v 



-rri^ {v) — \w\ — l 
-m^ {v) — \'w\—k ' 



Now, let us introduce the 



and k, we automatically fix v_^„,^s^ and v 

partition which is illustrated in the upper part of Figure [4J Define for any a G A 

a{a) := inf P(a|z), 

z_ 

and the collection of intervals \I(a)]aeA-, each one having length |/(a)| = a(a). For 
any u G A* U A^^ such that vrC"(v) < +oo we define the collection of intervals 

r{a,v,k) , aeA,k = 0,...,\v\~m"'{v)~\w\, 

each one having length 

\I^'(a, V, k)\ = a™ (a, u, k) - a^\a, v,k-l) 

for fc > 1, and 

|/"'(a,v,0)| =a'"{a,v,0)-a{a). 

Suppose now we are given an entire past z G yl^^(u'), and glue these intervals in the 
following order 

/(l),/(2),...,/"'(l,z,0),/"'(2,z,0),...,/"'(l,z,l),/"'(2,z,l)... 

in such a way that the left extreme of /(I) coincides with 0, and the left extreme of 
each intervals coincides with the right extreme of the preceding interval. What we 
obtain, by the convergence (15), is a partition of [0, 1[ such that for any a £ A 



Leb I{a) U y /'"(a,z, k) = P{a\z), 



k>0 



where Leb denote the Lebesgue measure on [0, 1[. It is important to notice that for 
any a, k and z G ^^'^'(w), we can construct the interval I'^{a,z, k) knowing only the 



suffix z 



-(k+\w\+m^{z))- 
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im 



1(2) 



Q_l 



/(l,z,0) 
7(2, z,0) 



E„. 



7(2,2,1) 



E„,.°-<" 



/(l,z,/c) 
1(2, z,k) 



E„..°-<" 



Figure 4. Illustration of the first partition (upper part) for a given 
past z £ yl^^(w), and of the second partition (lower part) which does 
not depend on the past. 



4.2.2. Definition of the second partition of [0,1[. Observe that {a^}fe>o is a [0,1]- 
valued non-decreasing sequence which converges to 1 as fc diverges. It follows that 
denoting a^^ := '^^^^j^ OL{a), and using the convention that a!^2 = Oj the sequence of 
intervals {[afc„i, Ofc [}fc>-i constitutes a partition of [0, 1[. This partition is illustrated 
in the lower part of Figure |4] 

4.2.3. Definition of the triplet. Let us introduce an i.i.d. chain U — {Ui)i£z of random 
variables uniformly distributed in [0,1[. We denote by (ri,J^, P) the corresponding 
probability space. It is the only probability space we will consider all along this paper. 
We now introduce one triplet [{X^}k>~i,{p~i{a)}a^A, {(''"fc',Pfc )}fe>o) that will give 
the decomposition stated in Theorem |4.1| Define 

• For any A: > — 1 

A,"':=P(C/oeK_i,a^[). (16) 

• For any a £ A 

p^,{a) := P(t/o e I{a)[\Uo G [Q,a^,[) = a{a)/a^,. (17) 

• For any A; > 0, let 

rr := A-^iw) U U (J \J cZlwbz], k> 0, (18) 

i>o bz]ei'(w) czleA'' 

and for v E rjf (that is, |t;| — m^{v) + \w\ + k) we put 

p(L/oe/(a)UUto^"'(a>«>Okoe [<_!,<[) if m"'(t.) < +oo 



Pkia\v) 



Pia\v)-^-iP-i(a) 



otherwise. 



(19) 

Two examples of sequences of context trees {T^}fe>o on A = {1,2} are given in 
Figures [3] and [5] The first one with w = 2, and the second one with ■«; = 12. 

4.3. Proofs of the results of this section. 



Proof of Theorem \4 ■ 1\ What we have to prove is that equality (11) holds with the 
triplet ({Afc}fc>^i, {p^i{a)}aeA, {(T^iPfe )}fc>o) introduced above. On the one hand, 
using (15), for any a E A and z E A~^{'w) we have 



P{a\z) = P{Uo E I{a)) 



UoE \Jl^{a,z,k) 



(20) 



fc>0 



PERFECT SIMULATION FOR CHAINS WITH INFINITE MEMORY 



10 



77 



12 



T- 



12 




T: 



12 




Figure 5. 
where the second term can be rewritten 



fc>0 



;>o 



On the other hand, by the definition of a^, k > —1, we have I{a) C [0, a™]^[ and for 
any fc > 

k 
aeAl=0 

it follows that for any z e ^^^(w), 

P{a\z) - X'^,P{UoeI{a)\Uoe[0,a1,[) 

+ 5] A-P (u e I{a) U U /-(a,^,0|c/ G K',<'-i[) • 

fc>0 \ (=0 / 



It follows from ( 19 ) that for any z£ A '^ and a E A 

P{a\z) = X-,p^,{a) + J2\]^p^{a\cr;,{z)). 



k>0 



o 
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Proof of Proposition 4-1 The fact that (13 1 impUes (14) is clear since A'^(w) C A'^. 



Now let us show that (|14|) implies (|10|). First, define for any fc > and i > 



inf inf 



y ini P [a\b_^ w c Iz 



-1 



and observe that 



inf V inf P 



^-{k+i+\w\) ^1 - '^k,i- 



Thus, condition (10) implies at the same time that ajf^ — > 1 for any fixed i and 



a 



i— >-+oo 



k.i 



1 for any fixed k. Since a^j belongs to [0, 1] for any k and i, it follows that 



infi>oa^j(:= a)f) also goes to 1 as A: diverges. 



D 



5. Second main result: perfect simulation 

In this section, we present the perfect simulation algorithm and state the second 
main result of this paper, which gives sufficient conditions for the algorithm to stop 
after a P-a.s. finite number of steps. 

5.1. Explaining how our algorithm works. The algorithm works as a mixture of 



the algorithm presented in Gallo ( 2009 ) and the one of Comets et al. ( 2002 ) . 
Assume that the set 

E :={ae A:infP(a|z) > 0} 

z_ 

is not empty, and let E* denotes the set of finite strings of symbols of E. 

Consider a transition probability kernel P satisfying the condition of Theorem |4.1| 
with reference string w (^ E* . In order to simplify the notation, we will omit the 
superscript w in most of the quantities that depend on this string. 

We want to get a deterministic measurable function X : [0, Ip— ?> A^, U i~> ^(U) 
such that the law P(X(U) G •) is compatible with P in the sense of ^\. The idea 
is to use the sequence U together with the partitions of [0, 1[ introduced before (and 
illustrated in Figure H| to mimic the two steps procedure we described in Section 

In particular, for any n € Z, we put [X(U)]„ — a whenever C/„ € li^)- Suppose 
that for some time index n e Z there exists a string aZ_\ G A'^ such that f/„_i g 
I{a-i), i = 1, . . . , fc, in this case, we put 

[X(U) 

We say that this sample has been spontaneously constructed. Now suppose C/„ € 
[Q;;_i,a;[ for some I > 0. This means that we pick up the context tree t; in the 
countable mixture representation of P, and look whether or not there exists a context 
in Ti which is suffix of [X(U)]"I^ — aZ].- If there exists such a context, then we put 



:l = «:l- 



[^(U)]„ 



E 

aeA 



a.l \Une[j I{a,a_l,i) 



If there is no such context (we will write Ct, (a_fc) = 0) we cannot construct the state 
[X(U)]„: we need more knowledge of the past. In the first case, [X(U)]^_j, has been 



-fc-i 



constructed independently of U!^Zo " ^^'^ ^n+i- ^O'^ suppose we want to construct 
[X(U)]o. We generate backward in time the C/^'s until the first time fc < such that 
we can perform the above construction from time fc up to time using only U^. A 



priori, there is no reason for fc to be finite. Theorem |5. 1| gives sufficient conditions for 
fc to be finite P-almost surely. 
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To formalize what we just said, let us define for any u £ [0, 1[ 

e{u) = ^ k.l{u e [ak-i,ak[}. 
fc>-i 



By Theorem 4.1 i(Ui) — —1 means that we can choose the state of X{\J)i accord- 
ing to distribution p_i(-), and independently of everything else. On the other hand, 
£(Ui) = I > means that we have to use the context tree {ti,pi) in order to construct 
the state of X{\J)i. In particular, we recall that for any I > the size of the context 

c,,(a;^) is m-«) + k|+L 

One of the inputs for Algorithm 1 is the update function F. It is a measurable function 
F : [0, l[x (0 U yl* U A^^) -> A U {*} which uses the part of the past we already know 
and the uniform random variable to compute the present state. It is defined as follows: 
for any aj^ S U A* U A~^, with — cxd < n < +00 and —00 < m < n + 1, 

F{u, <J := I EaeA «-l {" e US Ha, <, fc)} if l{u) > and c,,^„, {a^J + 
I • otherwise 

(21) 
with the conventions that a\\j^^ — and for any context tree r, Ct-(0) = 0. When we 



consider an infinite past z£ A (w), we have by (20), for any u G [0, 1[ 



P(F(u,z) = a) =P Ue/(a)U y /"'(a,z,fc) ^ P{a\z). (22) 

\ fc>o / 

When the update function returns the symbol •, it means that we do not have suffi- 
cient knowledge of the past to compute the present state. 

We define, for any m < n, the J^(C/"J-measurable function £ : [0, i["-™+i— > {o, 1} 
which takes value 1 if, and only if, we can construct [X(U)]'Jj independently of U^^^ 
and U^^ using the construction described above. Formally 

n 

{£{U-J = 1} := U fl {^(f/., aj„-i) = aj. 

Finally, for any —00 < m < n < -|-oo, we define the regeneration time for the 
window [tti, n] as the first time before m such that the construction described above 
is successful until time n, that is 

e[m, n] := max{fc < m : £([7^) = 1} (23) 

with the convention that 9[m] := 9[m,m]. 

5.2. The algorithm. This algorithm takes as "input" two integers —00 < m < n < 
-l-oo and the update function F, and returns as "output" the regeneration time 6[m, n] 
and the constructed sample [X(U)]gr ,. The function F contains all the information 
we need about P, and we suppose that it is already implemented in the software used 
for programing the algorithm. 

At each time, the set B contains the sites that remains to be constructed. At first 
B = {m, . . . , n} and a forward procedure (fines 2-8) tries to construct [Ar(U)]^ using 
Um, ■ ■ ■ , U„. If it succeeds, then the algorithm stops and returns 9[m,n] = m and 
the constructed sample. If it fails, B is not empty and a backward procedure ("while 
loop": lines 10-27) begins. In this loop, each time the algorithm cannot construct the 
next site of B, it generates a new uniform random variable backward in time. At each 
new generated random variable, the algorithm attempts to go as far as possible in the 
construction of the remaining sites of B using the uniform that have been previously 
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Algorithm 1 Perfect simulation algorithm of the sample [X(U)]J^ 



, n— rn+l 



Input: m, n, F; Output: 9[m,n\, ([X(U)]e[„^„], . . . , [X(U)]^ 

Sample [/,„, . . . , t/„ uniformly in [0, 1[ 

i ^ m, B = {m, . . . , n}, 9[m, n] -(— m, [X(U)]"j 

while F{Ui, [X(U)]j^i) e A and B 7^ do 

[X(U)], ^ FiU.„ [X(U)]-i) 

B ^ B\{i} 

i ^ i + 1 
end while 



while B 7^ do 

i <— i - 1 
B ^ BU{i} 

Sample C/j uniformly in [0, 1[ 
while Ui e [#£e,l[ do 
15: i ^ i - I 

16: B ^ BU {i} 

17: Sample f/^ uniformly in [0, 1[ 

end while 
[X(U)], ^F([/„0) 
B ^ B\{i} 
t <— min _B 

while F{Uu [X(U)]*"^) e A and B 7^ do 
[X{\J% ^ F{Ut,[X{\J)]'-') 
B ^ B\{t} 
t ^ min B 
end while 
end while 
9[m, n] ^ i 
return ^[771,71], ([X(U)]9K„], . . . , [X(U)]„) 



18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 



generated. Theorem 5.1 gives sufficient conditions for this procedure to stop after a 
finite number of steps. 

5.3. Statement of the second main theorem. 

Theorem 5.1. Consider a kernel P satisfying the conditions of Theorem \4-l\ for 



some string w G £* . If the sequence (a]f)fe>o defined by (10) satisfies 



> (1 — a™) < +00 or, equivalently I I a™ > 
fe>o y k>o 

then Algorithm 1 stops after a V-a.s. finite number of steps for any — cxd < m < n < 
+00. Moreover, for any n G Z 

^P(6'[0] < -/) <+oo. (24) 

Corollary 5.1. The output of Algorithm 1 is a sample of the unique stationary chain 
compatible with P. Moreover, there exists a sequence of random times T — T(U) 
which splits the realization X into i. i. d. pieces. More specifically, the random strings 
{[X(\J)]Ti, . . . , [X(U)]Ti_|_i-i)j5^o o,i"G i.i.d. and have finite expected size. 
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The proof of Corollary 5.1 using the CFTP algorithm and Theorem 5.1 is essen- 



tially the same as Comets et al. (2002) (Proposition 6.1, Corollary 4.1 and Corollary 



4.3). We omit these proofs in the present work and just mention the main ideas. The 
existence statement follows once we observe that Theorem |5.1| implies that one can 
construct a bi-infinite sequence X verifying for any ri G Z, X„ = F{Un,X'^'^). By 
(22 1, this chain is therefore compatible in the sense of (pi). It is stationary by construc- 
tion. The uniqueness statement follows from the loss of memory the chain inherits 
because of the existence of almost surely finite regeneration times. The regeneration 



scheme follows from (24). 



6. Proof of Theorem 15.11 

Let us explain what are the main steps of this proof. To study directly the random 
variable 9[m,n] is complicated, because it depends on the construction of the states 
of ^(U): in order to construct the next state of the chain, we may need to know 
the distance to the last occurrence of w in the constructed sample. The idea is to 
introduce, first, a new random variable 9[m^ri\, defined by (29), which can be used 
to define a lower bound for 9[m,rL\. The advantage of 9[m,rL\ is that its definition 
depends on the reconstructed sample only through the spontaneous occurrences of w. 
Section [O] is dedicated to the definition of this new random variable. After that, the 
main problem is transformed into the problem of sho wing that 9[m, n] is itself P-a.s. 
finite. To s olve this new problem, we study in Section 6.2 an auxiliary process D'"', 
defined by (30). The probability of return to of D^"-* is related to the distribution 



of 6'[0,n] through equation (31). Th e co nclusion of the proof is done in Section 
by studying the chain D^"^ (Lemma 6.1 ). 



6.3 



6.1. Definition of a new random variable 9[m^n]. Define the i.i.d. stochas- 
tic chain Z which takes value Zi — a \i Ui belongs to /(a), and Zi = -k other- 
wise. This chain takes in account only the symbols which appear spontaneously in 
X{\J): [X{\])]i — a whenever Zi = a, and in particular [^(U)]*_, , , j^ — w whenever 

^i-\w\+i ~ '"^' ^'^^ ^^y * ^ ^- ^^ ^^^° define the distance to the last spontaneous 
occurrence of w in Z before time i as 



= inf U > 



Z'-^-} ,=w\ 

i—k—\w\ J 



w whenever ^;^_|^|+i 
Denote L 



L, 



(25) 
w\ + Iq which 



Suppose we already constructed a sample [X(U)]I^. Since for any n, [X{\J)Y^_,>^-^ = 
w, it follows that tjiq is larger or equal than ■m^{[X{\J)]zl.)- 
£{Ui) and define the random variable 

if z, e £, 

nii + \w\ + £i otherwise. 

Then, whenever Lq > 0, it is larger or equal than m^ {[X {XJ)]zl.] 
is the size of the suffix of [X(U)]l^ we need to know in order to construct [X(U)]o. 
Before we define 9[m,n\, let us introduce an intermediary random variable 9'[m,n] 
which depends on the spontaneous occurrences of w. For any — oo < m < n < -|-c» 

9'[m, n] == max{/c < m : Li < i — k , i ~ k, . . . ,n}. (26) 

Associate to each site i e {9'[7n, n], . . . , n} an arrow going from time i to time i — 
Li. Definition (26) says that no arrow will pass time 9'[m,n\, meaning that we can 
construct [X(U)]^,[^_„] knowing only C/g",[,^„j. Therefore, C{U^,[^^,,{) = 1- Since 
9['m,n\ is the maximum over all time indexes k <m such that C{UJ}) = 1, it follows 
that 

9'\m, n] < 9\m, n]. 
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The definition of 0[m,n\ is done using the following rescaled quantities. Consider 
the chain Z defined by 

Z =1 ^ ^^ Um\w\-i+i (^ I{w-t), « = 0, ...,|m;| - 1 ,^7) 

'" \ • otherwise. ^ ^ 

and the rescaled function 

- _ \ sup{£j : i = {i- l)\w\ + l,...,^|w|} ^ 

where for any r G M, \r\ denotes the smaller integer which is larger or equal to r. 
Using these rescaled quantities, we define the corresponding random variables 

TTij = inf {fc > : ^i-fe-i = 1} 

which is the distance to the last occurrence of 1 in Z^S^ and 

_ if Z,^l, ,^8) 

fiii + \ + Hi otherwise. 

T he utility of all these new definitions lays in the fact (which is proven in details 



m I G alio (2009), the only difference being the definition of the function (.j) that the 
rescaled random variable 

^[0, n] := max{fc < : Z^ < i - fc , i = fc, . . . , n} (29) 

satisfies the inequality 

(^[0, n] - l)|w| + 1 < 6l'[0, n\w\] < e[0, n\w\] 

for any n > 0. All we need to study now is the distribution of 6[0, n]. This is done in 
Sections 16.21 and 16.31 

To clarify the relationship between 0[O, n], 0'[O, n|i(;|] and 0[O,n|w|] let us give a 
concrete example. 



Example. Consider a kernel P satisfying the conditions of Theorem |5.1[ with a 
string w having length \w\ — 3. Assume we are given a sample f/^38 (which we do 
not specify) to which correspond two samples Z'^^^ and Z^i2, with two sequences of 
arrows L^^^ and L^_i2- The sample Z^^^ together with the sequence of arrows -Zj^38 
are illustrated in the lower part of Figure pi and the sample Z'^12 together with the 
sequence of arrows Zii2 are illustrated in the middle part of Figure pi The loops 
mean that Li = 01 Li = 0. 

We have sufficient information to determine lower bounds for 9[0, 6]. In fact, we can 
see on the lower sequence that no arrow merging from i G {—29, . . . , 6} go further time 
—29, and that —29 is the first time in the past satisfying this. Therefore ^'[0, 6] = —29. 
This is a first lower bound for 9[0, 6]. Another lower bound can be obtained looking 
at the sequence in the middle of the figure, no arrow goes further time —12, meaning 



that 0[O, 2] = —12. Then, as we said, 0[O, 2] satisfies inequality (29), this allows us to 
use the lower bound (^[0, 2] - l)|u;| + 1 = -38 for e[0, 6]. 

6.2. A new auxiliary chain for the study of ^[0, n]. For any n G Z, the chain 
D^") takes values -Dj = for any i < n and 

D^ ) = (^ - ^(") - L,) V , Vz > ?i + 1, (30) 

where i*^"^ :— max{Z < i : Df'' ~ 0}. The behavior of this chain is explained in 
the upper part of Figure |6| But it is clear from its definition that if Dn > for 
n = i + 1, . . . , fc for some k > i + 1, then, in the process L, no arrow merging from 
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w ***'U; w * w * * w 



-9 -6 -3 



Figure 6. Illustration of the inequalities of ( p9| (samples of Z, L, 
Z and L) and of the behavior of the chain D'^'\ for i — —13, —11, —8 
constructed using the samples of Z and L. 



{i + 1, . . . ,k} passes time i + 1, meaning that 9[i + l,k] = i + 1. More generally, the 
sequence of chains {D(")}„gz satisfies the equation 



— 1 n 

{mn]<-i}= n u H'=o} 



■!=-/-! fc=i+l 



In fact, wc can show in a similar way as in Section 6 of Gallo (20091, the only 
difference being the definition of the function £i , that 

P(6'[0,n] < -/) < Yl ^(^k^ = ^^ (31) 

where [rj denotes the integer part of r. This inequality relates the distribution we 
are interested in with the probability of return to of the chain D^''^. 
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6.3. Finishing the proof of Theorem |5.1[ Owning to inequality (31), tlie proof 



of Theorem 5.1 is done if we prove the following lemma. 
Lemma 6.1. Under the conditions of Theorem\5.1\ 



^P(i?|°)=0) <+oo. 



Proof. For the clarity of the presentation, let us consider the chain E^"' which is 
defined using D*^°-' as follows. -E| — for i < 0, and for i>l 



E, 



(0) _ 



D 



(0) 



whenever D, 



(0) 



i — i(°) or 



P(0) 



(32) 



The behavior of E*^"^ is easier to understand than D'^^ and their relationship is il- 
lustrated in Figure M for given samples Z and L. At time j > 0, supposing that 
Ef\ = n > 0, 



E. 



(0) 



j - /") if C/,|^|_,+i e I{w^i) for any I = 0, . . . , |w| - 1, (i.e. Zj = 1) 



n 




if a_i < C/,|^|_i+i < a„_i for any ? = 0, . 



if t/. 



j\w\-i+ 



1 > CKji-i for some z = 0, , 



(33) 
It is clear that V{d\^^ = 0) = V{e'i°^ = 0) and that the state is renewal for 



E(°). It follows from 



Feller 



(1968 



Chapter XIII.IO, Theorem 1) that P(^f ^ = 0) is 



summable in k if and only if the state is transient. Denote by C, the first time after 
time that the chain E^^^^ returns to the state 0, and for fc > 1 we put /^ = P(C = k). 



OD(o) 



— O 



O 



X X X X 




4? 



4) 



4) 4) 4) 



1 2 ... 

Figure 7. Figure illustrating the behavior of the chains E*^*^^ and 
D*^°-' together, both using the samples of Z and L. 



L 
Z 
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We want to show that under the assumption of Theorem 1 5. 1[ the state is transient 
for E(°). Let us denote by d^i the event {Q = i) C^ {Z{ = F} for ? > 1 and i > Z + 1. 

If Zf = 1*^ for some M > 1, then E^^} = M, G,^m = for i < M and for any 
i> M +1 



-7M _TMxr.sT?(o) ^ ,,, . ,-M+i,...,i-2}n U {4-i = fe}n{^f' =0}. 

k=M 



G^,M = {Z^' = l''}n{E]'>^ > M : J 



The definition of E*^"^ imphes that whenever {C = i} for i > 1, 

{S|!\ - fc} - {Ef ^k iOT j^k,...,i-l} 



for 1 < fc < i — 1. It follows that 



i-l 



G.,M = {^f = 1''} n U {£;]") >M:j^M+l,...,k-l} 



k=M 



n{Ef =k:j = k,..., I ~i}n {E^°^ = 0}. 
Using (331 and the chain U, one obtains that for m > 1 and i > M + 1 

{Z^' ^l^'}n{Ef>M:j^M+l,...,k-l} 



G, 



AI 



u 



event B 

l\w\ 



n {Zk = 1} n Pi Pi {a_i < [/„ < a(fe_i)|^„|} 

/=A;+lrn=(;-l)|i«|+l 



) (34) 



event C 



i\w\ 



] = (i-l)\io\+l 



event "D 



r(»-l)l«'l 



where the event B is F{U{ )-measurable, the event C is J^(t/}^_^{'|^'| , j^)-measurable 

and the event V is J^([/*|^,'^ j^,,^, j^)-measurable. Therefore, they are independents. 
Recall that w ^ 8* and assume that 

inf lid P{'w^i\z) = e > 0. 

i=l,...,|tii| z 

Using the partition 

LJiC = = I UiC = n {^f = 1*^} I u I UlC - *} n {zf ^ i'^^} ] , vm > i 

one obtains the following upper bound (recall that Gi,M = for i < M): 
E/»< E P(G.3/) + (!-£'""'), 

j>l i>M+l 

which holds for any M > 1. Using the fact that ak < 1 for any k, we have 

P(i3) < el"'!^^ 

P(C) = (aH(fe_i)-«_i)l»l(^-'=-i)<(l-a_i)l-l(^-'=-i) 
P(P) < |u;|(l-a|^|(;,_i)). 
Therefore, equality (34) gives us the following upper bound for any i > M + 1 

i-l 

P(G.,m) < \w\e^^^" 5] (1 - a.i)l-l(^-'=-i)(l - a(,.i)|^|). 

fc=M 
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^\w\(t~k-l) 



(l-«(fe-i)|^|) = ^ il-a(^k^i)i^,i) Y^ (l-a-i)(* '' ^ 



We have 

E Ed- 

i>M+lk=M k>M i>k+l 

where we interchanged the order of the sums, this last equation yields 



)\w\ 



i>l 



- (l-a_i)l'"l 



(35) 



k>M 



Under the conditions of Theorem 



5.1 



X]fc>A/(-'^ ~ ct(fe-i)|tu|) goes to as M increases. 
This means that the right hand side of ( pSJ ) is strictly smaller than 1 for some suffi- 
ciently large M, and it follows that X]i;>i /« < 1- This finishes the proof of Lemma 
EH " □ 



7. Conclusion 



Comets et al. ( 2002 1 use the uniform continuity assumption aj^ 



CFF 



1 . Perfect 



simulation under a weaker condition was done recently by De Santis & Piccioni (2010 1 
requiring only punctual continuity, ie, a'^^^{a) — >■ 1 for any a in the set of "admissible 
histories" (see Section 2 therein). Our extension allows to consider kernels P having 
discontinuities along all the points a e A~°°(w), for any w G £*, and a priori, no 
assumption is made on the set of "admissible histories", so that it is generically the 
set A~^. More specifically, consider a transition probability kernel P such that a^^^ 
satisfies J2k>oi^ ~ '^k^^) ^ +°°- ^^ follows that, for any w £ £*, 



< +00. 



Ed- 

fe>0 

Now consider any P satisfying that a^ 

branches not containing the string w. Theorem |5.1| says that we still can make a 

perfect simulation of the unique stationary chain compatible with P. This shows that 



and allowing discontinuities along 



our result is a strict generalization of the work of Comets et al. ( 2002 ) whenever we 

are in the regime J2k>o(^ ~ '^k^^) < +°^- 

Also, our condition does not necessarily fit under the conditions of |De Santis fc| 
Piccioni (20101. It is possible to see this checking Equation (35) of Example 1 in their 
work. Their notation corresponds to 

ao(— 1) = a(— 1), ao(l) = Q^(l) and a^ 



= lim a 



CFF 

k 



Taking w — (— 1)(1) or w — (1)(— 1), Theorem 5.1 says that we only need ao(— 1) 
and ao(l) to be strictly positive without any assumption on Ooo- We also mention 



that this particular example was already handled by the results of Gallo ( 2009 ) 



Let us finish with some questions. The condition X)fc>o(-'^ 



< +00 guarantees 



that the perfect simulation scheme stops at a finite time 9 which has finite expected 
value. Can we find weaker conditions such that 9 is finite a.s. but has infinite expec- 
tation? Is the minorization assumption on our reference string (w d £*) necessary to 
obtain a practical coupling from the past algorithm for our class of (non-necessarily 
continuous) chains of infinite memory? 
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